Low Latency Geo-distributed Data Analytics – Public Review

نویسنده

  • Mohammad Alizadeh
چکیده

Large cloud service providers ingest massive amounts of data in geographically distributed sites spread across the globe. Analytics for such planetary-scale datasets is an important emerging challenge. The current practice is to copy all data to a central location, where it can be dealt with locally by standard data analytics stacks such as Hadoop and Spark. However, transferring large volumes of data over the WAN is very costly and slow, and may not even be possible in certain cases due to data sovereignty concerns and legal restrictions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Energy-efficient Analytics for Geographically Distributed Big Data

Big data analytics on geographically distributed datasets (across data centers or clusters) has been attracting increasing interests in both academia and industry, posing significant complications for system and algorithm design. In this article, we systematically investigate the geo-distributed big-data analytics framework by analyzing the fine-grained paradigm and the key design principles. W...

متن کامل

Towards Reliable (and Efficient) Job Executions in a Practical Geo-distributed Data Analytics System

Geo-distributed data analytics are increasingly common to derive useful information in large organisations. Naive extension of existing cluster-scale data analytics systems to the scale of geo-distributed data centers faces unique challenges including WAN bandwidth limits, regulatory constraints, changeable/unreliable runtime environment, and monetary costs. Our goal in this work is to develop ...

متن کامل

CLARINET: WAN-Aware Optimization for Analytics Queries

Recent work has made the case for geo-distributed analytics, where data collected and stored at multiple datacenters and edge sites world-wide is analyzed in situ to drive operational and management decisions. A key issue in such systems is ensuring low response times for analytics queries issued against geo-distributed data. A central determinant of response time is the query execution plan (Q...

متن کامل

Towards a Leaner Geo-distributed Cloud Infrastructure

Modern cloud infrastructures are geo-distributed. Geodistribution offers many advantages but can increase the total cloud capacity required. To achieve low latency, geo-distribution forfeits statistical multiplexing of demand that a single data center could benefit from. Geo-distribution also complicates software design due to storage consistency issues. On the other hand, geodistribution can l...

متن کامل

Getafix: Workload-aware Distributed Interactive Analytics

Distributed interactive analytics engines (Druid, Redshift, Pinot) need to achieve low query latency while using the least storage space. This paper presents a solution to the problem of replication of data blocks and routing of queries. Our techniques decide the replication level of individual data blocks (based on popularity, access counts), as well as output optimal placement patterns for su...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015